A fusion scheme of visual and auditory modalities for event detection in sports video
نویسندگان
چکیده
In this paper, we propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i.e. silence, hitting ball, applause, commentator speech) with the goal of detecting events with strong semantic meaning. For instance, for tennis video, we have identified five interesting events: serve, reserve, ace, return, and score. Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can be easily adapted to a new sports game. We are extending this fusion scheme to three additional typical sports videos: basketball, volleyball and soccer. Correctly detected sports video events will greatly facilitate further structural and temporal analysis, such as sports video skimming, table of content, etc.
منابع مشابه
Event Detection in Basketball Video Using Multiple Modalities
Semantic sports video analysis has attracted more and more attention recently. In this paper, we present a basketball event detection method by using multiple modalities. Instead of using low-level features, the proposed method is built upon visual and auditory midlevel features i.e. semantic shot classes and audio keywords. Promising event detection results have been achieved. By heuristically...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملCollaborative Multimedia Analysis for Detecting Semantical Events from Broadcasted Sports Video
In this paper, we present an approach towards detecting semantical events from broadcasted sports video through collaborative multimedia analysis, called intermodal collaboration. Broadcasted video can be viewed as a set of multimodal streams such as visual, auditory, and textual (closed caption: CC) streams. Considering temporal dependency between their streams, we aim to improve the reliabili...
متن کاملAudio-Visual Event Localization in Unconstrained Videos
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos. We define an audio-visual event as an event that is both visible and audible in a video segment. We collect an Audio-Visual Event (AVE) dataset to systemically investigate three temporal localization tasks: supervised and weakly-supervised audio-visual event localization, and cross-modality l...
متن کاملMultimodal Information Fusion for Semantic Video Analysis
Multimedia data by its very nature contains multimodal information in it. For a successful analysis of multimedia content, all available multimodal information should be utilized. Additionally, since concepts can contain valuable cues about other concepts, concept interaction is a crucial source of multimedia information and helps to increase the fusion performance. The aim of this study is to ...
متن کامل